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Abstract 

To study the preference of infants for contin- 
gency of movements diiid familiarity of faces dur- 
ing self-recognition task, we built, as an accu- 
rate and instantaneous imitator, a real-time face- 
swapper for videos. 

1. Self-recognition development 

Human neonates detect contingency between their move- 
ments and what they see ( Rochat , 2009 ) but they cannot 



discriminate their own image from that of another infant 
before 5 months of age. Only by 18 months can they rec- 
ognize themselves in a mirror. The 6 tol8 months period 
is a decisive developmental stage. 

Behavioral studies have shown that 9-month-olds dis- 
play a preference for familiar faces similar to themselves 



(Sanefuji et al. 2006), but also that 5-month-olds show 



differential visual fixation to a contingent video ( Bahrick 



and Watson, 1985). 



2. Unbiased imitator 

We propose to compare the contingency of movements 
and familiarity of faces factors in self-recognition in an 
experiment where an imitator reproduces the head, arms 
and body movements with or without delay. The imita- 
tor's face may be identical to the subject's face or look 
different. We thus developed a face-swapper for videos 
that detects the face position and orientation of the cur- 
rent subject A, then superimposes the image of a subject 
B on A's face (fig. [l] where A and B's faces are identical). 




Figure 1: face-swap: B's face is superimposed on A's face 

To avoid disturbing the subject's behavior or appear- 
ance, we did not use special markers. The only installa- 



tion was a camera. Our no n- constraint real-time system 
is an integration of existing 3D head posture trackers, 
with an original face- swapper in videos. The very short 
delay of the face swapper has been reached thanks to par- 
allel computing including the General-Purpose comput- 
ing on Graphics Processing Units (GPGPU). The nov- 
elty of this work also lies in its easy calibration. 

3. Face- swapper for videos 

The overall system (fig. [2| includes a head tracker to 
determine the head position and orientation of the cur- 
rent subject A, and a face swapper to replace the face of 
A by that of subject B. Its calibration only uses frontal 
face pictures of subjects A and B, and the camera video 
as inputs. 
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Figure 2: The system contains a face tracker and swapper 



3.1 3D visual tracker 

Devices measuring the head's pose such as magnetic 
sensors, link mechanisms or motion capture unfortu- 
nately alter the subjects' behavior or their natural ap- 
pearance. As non-invasive methods, faceAPI, sparse- 
template-matching-based object tracking (Matsubara 



and Shakunaga, 2004) and CAMSHIFT solutions exist. 



However they either are commercial systems where the 
information needed to adapt it for children and extend 
it to a face-swapper could be inaccessible, or they lack 
robustness. Matsumoto et al.^ (2009} propose an esti- 
mation of the 6-DOF motion of the face using a single 
camera, but require the heavy set-up of a personal 3D 



facial model. Lozano and Otsuka (2009) present a real 
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time visual tracker by stream processing and particle 
filter using a generic 3D model of the face. Our head 
tracker also adopts this approach to estimate the state 

where T^^Ty are the translation coordinates of the tar- 
get object, Tx^^^^Ty^^^ are the velocity along the horizon- 
tal X and vertical y axes, S is the scale, R^^Ry^Rz are 
the rotations along each axis, Ry^^^ is the velocity of 
the rotation along the vertical axis y, and a is a global 
illumination variable. 

Our tracker relies on multi-processing and sparse- 
template-based particle filtering. No 3D face model was 
used, but a simpler ellipsoid model. The real-time con- 
straint has been kept thanks to the parallel processing 
of the camera capture, head-tracking, face-swapping and 
results-display threads. Moreover the computation of 
the particle filter was speeded up by the use of GPGPU 
and NVIDIA CUDA. 

3.2 Face- swapper 

Once the face position and orientation of A is de- 
tected, an image of B is superimposed on A's face. 
Replacement of whole faces in still images has been 



developed only recently (Zhu et al. (2009), Bitouk et al. 
( [2QQ8| ). However, we target videos with real-time con- 
straints, continuity and movement factors. 

Our system first creates automatically a set of replace- 
ment faces of subject B and tags them with the position 
and orientation x. The face-swapper thread compares 
the state parameters x^ with those of the replacement 
faces of B. It selects the closest face replacement x^ to 
superimpose on A's face. To render the temporal conti- 
nuity, the replacement face is interpolated before super- 
imposition, so that the replacement looks dynamic. We 
obtain a whole system for automatically replacing faces 
in videos, that renders dynamic movements of the head. 

3.3 Performance 



A demonstration video can be found on the site 



//www . youtube . com/wat ch?v=qtY14o4QoIo 



http: 



The real-time constraint was the greatest challenge. 
The use of GPGPU and parallel processing decreased the 
delay to 99ms. In addition, our face-swapper is robust 
against background distractors such as other faces in the 
background (fig. [T]). The head-tracker can detect a wide 
range of face orientations with a head pitch angle up to 
70 degrees and is robust against partial occlusion like 
when children bring their toys or hands to their mouth 
or faces. 

The system was evaluated against motion capture sys- 
tem, with an adult subject moving at normal speed, and 
a head rotation ranging from -40 to 40°. The pitch an- 
gles measured by the motion capture system and ours 
show the same variations. The average error is 9°. 



4. Conclusion 

We presented a non-constraint face-swapper based on 
3D visual tracking that achieves real-time performance 
through parallel computing. Our imitator system is par- 
ticularly suited for experiments involving children with 
Autistic Spectrum Disorder who are often strongly dis- 
turbed by the constraints of other methods (Frith^^2003l ). 
It can estimate their attention point during natural so- 
cial interaction to study their peculiar attention pattern, 
or be used as an imitator to evaluate how imitation fa- 
cilitates their social behaviors. Future improvement can 
focus on the facial expressions. 

We plan to conduct our experiment with children to 
investigate the importance of the contingency and famil- 
iarity factors in self-recognition. In the longer perspec- 
tive, the results could be confronted to neuro-scientific 
data such as the activation in the frontal lobe of the 



right hemisphere (Uddin et al. 2005) or the default net- 



work ( ,Goa et al. 2009 ) to model the development of self- 
consciousness, in the context of human self-consciousness 
analysis, but also for implementation of a robotic sense 
of self-consciousness. 
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